Ripley, Hand Me the Cup! (sensorimotor Representations for Grounding Word Meaning)
نویسندگان
چکیده
People leverage situational context when using language. Rather than convey all information through words, listeners can infer speakers’ meanings due to shared common ground [1, 2]. For machines to engage fully in conversation with humans, they must also link words to the world. We present a sensorimotor representation for physically grounding action verbs, modifiers, and spatial relations. We demonstrate an implementation of this framework in an interactive robot that uses the grounded lexicon to translate spoken commands into situationally appropriate actions. 1. SITUATED SPOKEN LANGUAGE Speakers use spoken language to convey meaning to listeners by leveraging situational context. Context includes many levels of knowledge ranging from fine grain details of shared physical environments to shared cultural norms. As the degree of shared context decreases between communication partners, the efficiency of language also decreases since the speaker is forced to explicate increasing quantities of information that could otherwise be left unsaid. A sufficient lack of common ground can lead to communication failures. If machines are to engage in meaningful, fluent, situated spoken dialog, they must be aware of their situational context. As a starting point, we focus our attention on physical context. A machine that is aware of where it is, what it is doing, the presence and activities of other objects and people which are in its vicinity, and salient aspects of recent history, can use these contextual factors to understand spoken language in a context-dependent manner. A concrete example helps illustrate how a machine can make use of situational context. Consider a speech interface to the lights in a room1. If a person simply says, “Lights!”, the appropriate action will depend on the current state of the light. If it is already on, the command means turn off, 1Ignoring, for the moment, the difficult issue of microphone placement and background noise that would also need attention. but if it is already off, it means the opposite. In this simple example, the language understander needs access to a single bit of situational context, the current state of the light. Consider a slightly richer problem, still in the domain of the light controller. How should the spoken command softer be interpreted by the light? Perhaps the simplest solution would be to decrease the intensity of the light by a fixed amount. Although this solution might be functional, it is not necessarily the most natural. In contrast to a fixed-interval solution, a person responding to this request would be likely to decrease the intensity by an amount that is a function of the intensity of light in the room at the time of the request. In general, many sources of light (e.g., from a setting sun) may contribute to the total ambient light in the room. For a machine to leverage this situational information, we could add a light sensor to the controller that is able to monitor ambient lighting conditions. A context-dependent interpretation of “softer” could then be defined. 1.1. Language Grounding A necessary step towards creating situated speech processing systems is to develop representations and procedures that enable machines to ground the meaning of words in their physical environments. In contrast to dictionary definitions that represent words in terms of other words (leading, inevitably, to circular definitions for all words), grounded definitions anchor word meanings in non-linguistic primitives. Assuming that a machine has access to its environment through appropriate sensory channels, language grounding enables machines to link linguistic meanings to elements of the machine’s environment. From environmentally aware light controllers to car navigation systems that see the same visual landmarks as the driver, the idea of a context-grounded speech processing is the tip of a very large iceberg. We believe that a large class of spoken language understanding applications may benefit from language grounding. We will refer to this class of systems as having grounded semantics in light of the explicit links of semantic representations to the machine’s physical
منابع مشابه
Conversational Robots: Building Blocks For Grounding Word Meaning
How can we build robots that engage in fluid spoken conversations with people, moving beyond canned responses to words and towards actually understanding? As a step towards addressing this question, we introduce a robotic architecture that provides a basis for grounding word meanings. The architecture provides perceptual, procedural, and affordance representations for grounding words. A percept...
متن کاملFrom Words to Sentences & Back: Characterizing Context-dependent Meaning Representations in the Brain
Recent Machine Learning systems in vision and language processing have drawn attention to single-word vector spaces, where concepts are represented by a set of basic features or attributes based on textual and perceptual input. However, such representations are still shallow and fall short from symbol grounding. In contrast, Grounded Cognition theories such as CAR (Concept Attribute Representat...
متن کاملConversational Robots: Building Blocks for Grounding Word Meanings
How can we build robots that engage in fluid spoken conversations with people, moving beyond canned responses to words and towards actually understanding? As a step towards addressing this question, we introduce a robotic architecture that provides a basis for grounding word meanings. The architecture provides perceptual, procedural, and affordance representations for grounding words. A percept...
متن کاملTask-dependent motor representations evoked by spatial words: Implications for embodied accounts of word meaning
Embodied accounts contend that word meaning is grounded in sensorimotor representation. In support of this view, research has found rapid motor priming effects on vertical movements for words like eagle or shoe, which differ as to whether they are typically associated with an up or down spatial direction. These priming effects are held to be the result of motor representations evoked as an obli...
متن کاملGrounding the Acquisition of Grammar in Sensorimotor Representations
Drawing on data from linguistics, developmental psychology and the neurosciences, we present a computational theory of the acquisition of early grammar by infants. Based on the view that language is a mapping between form and meaning, we propose that a theory of language acquisition must be tightly integrated with a theory of the infant’s prelinguistic representations. Namely, the infant’s task...
متن کامل